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Abstract. - The Yule-Simon model has been used as a tool to describe the growth of 
diverse systems, acquiring a paradigmatic character in many fields of research. Here we study 
a modified Yule-Simon model that takes into account the full history of the system by means 
of an hyperbolic memory kernel. We show how the memory kernel changes the properties 
of preferential attachment and provide an approximate analytical solution for the frequency 
distribution density as well as for the frequency-rank distribution. 



In 1925 Yule [1] proposed a model to explain experimental data on the abundances of bio- 
logical genera [2]. Thirty years later, Simon introduced an elegant copy and growth model [3], 
in spirit equivalent to Yule's model, to explain the observed power-law distribution of word 
frequencies in texts [4-6]. In Simon's growth model, new words are added to a text (more 
generally a stream) with constant probability p at each time step, whereas with complemen- 
tary probability p = 1 — p an already occurred word is chosen uniformly from within the 
already formed text (stream). This model yields a power-law distribution density for word 
frequencies P(k) ~ fc _/3 with j3 = 1 + 1/p . The same mechanism is at play in the pref- 
erential attachment (PA) model for growing networks proposed, in their pioneering article, 
by Barabasi and Albert [7]. In that case, a network is constructed by progressively adding 
new nodes and linking them to existing nodes with a probability proportional to their current 
connectivity. Yule-Simon processes and PA schemes are closely related to each other and a 
mapping between them has been provided by Bornholdt and Ebel [8] . 

In the original Yule-Simon process, the metaphor of text construction is somehow mis- 
leading because in that process there is no notion of temporal ordering. All existing words 
are equivalent and in many respects everything goes as in a Polya urn model [9]. However, 
the notion of temporal ordering may play an important role in determining the dynamics of 
many real systems. In this perspective it is interesting to investigate models where temporal 
ordering is explicitly taken into account. A first attempt in this direction has been provided 
by Dorogovtsev and Mendes (DM) [10], who studied a generalization of the Barabasi- Albert 
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Fig. 1 - Yule-Simon process with a fat-tailed memory kernel. 



model by introducing a notion of aging for nodes. Each node carries a temporal marker 
recording its time of arrival into the network, and its probability to be linked to newly added 
nodes is proportional to its current connectivity weighted by a power-law of its age. Another 
recent example has been proposed in [11] in relation with the very new phenomenon of collab- 
orative tagging [12]: new web sites appeared where users independently associate descriptive 
keywords - called tags - with disparate resources ranging from web pages to photographs. A 
sort of tag dynamics develops, eventually yielding a fat-tailed distribution of tag frequencies. 
In order to explain such phenomenology, a generalization of the Yule-Simon process has been 
introduced [11], which explicitely takes into account the time ordering of tags. Specifically, 
an hyperbolic memory kernel has been introduced to weight the probability of copying an 
existing tag, affording a remarkable agreement with experimental data. 

In this Letter we show that the memory kernel induces a non-trivial change of the properties 
of PA with respect to the original Yule-Simon process as well as to the DM model with aging. 
Moreover, we analytically investigate the generalization of the Yule-Simon model and provide 
an approximate solution for the frequency distribution density as well as for the frequency-rank 
distribution. 

The model we investigate is defined as follows. We start with no words. At every time 
step t a new word may be invented with probability p and appended to the text, while with 
probability p = 1 — p one word is copied from the text, going back in time by i steps with 
a probability that decays with % as Q(i) = , as shown in Fig. ^ C(t) is a logarithmic 
time-dependent normalization factor and t is a characteristic time-scale over which recently 
added words have comparable probabilities. 

The first important observation concerns the deviations of our model from the pure PA rule 
of the original Yule-Simon model. An elegant and efficient way to check for deviations from 
PA was suggested by Newman [13]. In Simon's model, the probability of choosing an existing 
word, which already occurred k times at time t, is pkir{k 1 t), where n(k,t) is the fraction 
of words with frequency k at time t. In order to ascertain whether a PA mechanism might 
be at work, we construct the histogram of the frequencies of words that have been copied, 
weighting the contribution of each word according to the factor l/7r(fc,i). If this histogram 
displays a direct proportionality to the frequency k, then one might be observing a PA-driven 
growth. For our model, the numerical results in Fig. show that the chosen form of the 
memory kernel leads to a sub-linear attaching probability. The same kind of sub-linearity has 
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Fig. 2 - Deviations from the preferential attachment rule (Simon's model), in the case of our model 
and DM model. For all curves, p = 0.4 and 10 6 steps were simulated. Finite size effects are responsible 
for the drop at high frequencies, as extensively discussed in Ref. [13]. 



been observed in the growth dynamics of the wikipedia network [14]. Conversely, the DM 
model with hyperbolic kernel (a limiting case for the analysis of Ref. [10]) displays no clear 
dependence on k. 

In order to get a deeper insight into the phenomenology of the model we present an analyt- 
ical study aimed at computing the approximate functional form of the probability distribution 
of word frequencies as well as the corresponding frequency-rank distribution. In the following 
we shall write the normalization factor as C, with no explicit mention of its time dependence. 
We also define a(t) = pC(t), and we will similarly refer to it as a. We assume that word X 
occurred at time t for the first time, and we ask what is the probability P(At) that the next 
occurrence of X happens at time t + At, with At > 1. 

If At = 1, P(At) is the probability of replicating the previous word, i.e. the product be- 
tween the probability p of copying an old word, and the probability of choosing the immediately 
preceding word (i = 1) computed according to the chosen memory kernel, Q(l) = C/(r + l). 
This gives 

P(i ) = _££_ = _^. (i) 

W T+ 1 T+ 1 W 

For At > 1, P(At) can be computed as the product of the probabilities of not choosing 
word X for At — 1 consecutive steps, multiplied by the probability of choosing word X at 
step At. In order not to choose word X at the first step, one has to either append a new 
word (probability p) or copy an existing word (probability p) which is not X (probability 
l-C/(r + l)). 

Finally, under the approximation that C is constant from step to step, i.e. At t, we can 
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write the return probability as the product 



P(At) 



PC 
■ + At 



At-l 

n 

i=l 



P+P 1 



c 

t + i 



(2) 



Taking the logarithm of P(Ai), we can write the above product as the sum 



At-l 



]nP(At) = -a £ 



In- 



At 



(3) 



where we used the fact that a -C 1 for £ ^> 1. 

By using the approximate expression lnP(At) = J. (lnP(At' + 1) — In P(Ai')) dAt' we 
obtain 

P(At) ~a(l + T) Q (r + At)-"" 1 (4) 

derived under the assumption that t 3> At ^> 1. The estimated value of P{At) depends on 
time through a, so that the probability distribution of intervals At , which turns out to be 
correctly normalized, is non-stationary. 

We now focus, for simplicity, on the case r = 0. At any given time t, the characteristic return 
time (At) can be computed by using Eq. 0] 



(At) = V Pi At) At ~ — —t 1 - 01 
~— ' 1 — a 



(5) 



At=l 



In a continuum description the frequency ki of a given word z, will change according to 
the rate equation 

f=pn„ (6, 

where IT is the probability of picking up a previous occurrence of word i. With our choice of 
the memory kernel, the exact value of IT is given by the sum 



IT 



j=fc» 



j=l t tj 



(<) ' 



(7) 



where tj (j — 1, 2, . . . , ki) are the times of occurrence of word i. 

We adopt a mean-field approach and assume that the above sum can be written as the 
frequency ki times the average value of the term (t — t^) -1 over the occurrence times tj. 

As shown in Fig.|3Ji, this is supported by numerical evidence, so that we can write (dropping 
the word index (i) from here onward): 



j=ki 



n, 



j=l 3 



C ki 



t - U 



(8) 



where ()j denotes the average over the ki occurrences of word i. Furthermore, we assume 
that the average is dominated by the contribution of the most recent occurrence of word i, at 
time tfc 4 : ((t — tj)~ 1 )j ~ (t — ifc 4 ) ■ We replace t — t^ i with the typical return interval for 
word i, and use Eq. 0to estimate the latter, obtaining: 
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Fig. 3 - a) Rate Tli of Eq. □ for a given word i having frequency hi at time t (p = 0.05, no = 10, 
t = 30000) b) Memory kernel of Eq. llUl averaged over the times of occurrence tj and over about 2000 
realizations of the process (p = 0.05, n = 10, t = 5 • 10 3 ,10 4 ,2 ■ 10 4 ,3 • 10 4 ,5 ■ 10 4 ). Values are 
shown for a word of given frequency k = 200 (dots), a word of frequency k — 500 (crosses) and for 
the average over all frequencies (squares). Numerical error bars are within the size of data markers. 
The two curves are obtained by fitting Q in Eq. llOl against numerical data. The fitted continuous line 
sets the value of f2 used from Eg. I 111 onwards. 



which has a (sub-linear, since a > 0) power-law dependence on t and a slower (logarithmic) 
timc-dcpcndcnce through a. Fig. |3Jd shows that the above expression captures the correct 
temporal dependence of the average ({t — tj) ) for a given frequency fej, provided that a 
constant factor f2 is introduced, as follows: 

1 \ 1 1-a 1 

7^7- (10) 



— tj / ■ ft a 

J ' 3 

The need for a corrective factor Q is a consequence of our simplifying assumptions, namely 
our mean-field approximation, the fact that we ignored all occurrences of word i but the very 
last, and the approximations underlying our estimate of the return time At. Moreover, as 
shown in Fig. Ob, Q depends on the frequency k% of the selected word i. In order to keep 
only the linear dependence of the kernel on ki we approximate £1 with its average value over 
k, numerically estimated as ~ 1.52 (see Fig. |3)). While this is certainly a rather crude 
approximation, it appears to work remarkably well, as we will show in the following (Figs. 01 
and EJ). 

We introduce Ea. 1101 and Eq. [HI into the rate Eq. obtaining: 

^•"(^-s-i 1 -'- 1 "- (11 > 

We integrate Eci. llll again neglecting the slow time-dependence of a, from time tj, when word 
i appeared for the first time (with frequency 1) up to the final time t, when word i has reached 
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Fig. 4 Fig. 5 



Fig. 4 - Frequency probability distribution density P(k) of word occurrence. Numerical data (dots, 
averaged over 50 realizations and binned) are in very good agreement with Eq. ll4l Csolid line) (p = 0.05, 
no = 10, t = 30000, fl = 1.52). The dashed line is provided as a guide for the eye. 

Fig. 5 - Frequency-rank distribution P(R). Upper curves: Numerical data (dots, average over 50 
realizations) are compared against the prediction of Eg. 1151 1' solid line) (p = 0.05, no = 10, t = 30000, 
fi = 1.52). Here the value of fl is univocally set by our numerics, as explained in Fig. Eb. Lower 
curves (shifted one decade downwards): a single realization of our process (squares) is fitted with 
respect to SI against Eq |15l (dashed line), yielding Q = 1.46. 



frequency ki, 



k' 



a 



n 



dt't"*' 1 



Performing the integration we get the stretched exponential dependence 



k; = cxp 



1-a 



■ exp 



1 



= At 



-Kt° 



(12) 



(13) 



where K = and A i 

it C 



16]: 



The probability distribution density for word frequencies P(k) can now be computed as [15, 



P(k) 



ln(A/fc) 



K 



(14) 



(n + pt) (Ka) k 

and is in very good agreement with numerical evidence, as shown in Fig. 0] (upper curves) 
where it is worth noticing that the value of Q is univocally set by our numerics. The corre- 
sponding frequency-rank distribution is: 



P(R) 



A 



uq + t 



exp 



K * 



(15) 



Fig. El shows that the above equation is in fair agreement with numerical evidence. 

In this Letter we have shown how the introduction of a memory kernel drastically changes 
the properties of PA with respect of the original Yule-Simon process as well as the Dorogovtsev- 
Mendes model with aging [10]. 

In order to assess the role of the memory kernel we have presented a continuum approach to 
a modified Yule-Simon model. The presence of a long-term memory kernel makes the rigorous 
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treatment non trivial. Our approach makes use of some assumptions (sometimes rough, but 
numerically verified) concerning especially the functional form of the averaged memory kernel, 
both as a function of time and of word frequency. We require a single phenomenological pa- 
rameter (17), for which we presently have no theoretical estimates. Nevertheless our approach 
affords an excellent agreement between analytical and numerical results for the probability 
distribution density P(k). The frequency-rank distribution P(R) appears to be much more 
sensitive to the approximations we made, but the agreement between numerics and theory 
is nevertheless reasonable. This is somehow the signature that our theoretical treatment is 
capturing some of the important statistical features of the model. 

We wish to remark that the frequency probability density P{k) displayed by the model 
(Fig.^J could be easily confused with a power-law behavior with exponent —2, as in the original 
Yule-Simon model with p <C 1, and a simple PA mechanism could be inferred. Instead, as 
shown for the case at hand, more refined indicators (e.g. that of Fig.0) can tell apart different 
underlying mechanisms of growth. This should be read as a general warning against reading 
an apparent power-law behavior for the P(k) as the signature of a PA mechanism at play. 

The approach described here could be extended to the more complex case of r ^ 0. 
In this respect, several problems remain open: does r induce a relevant time scale? Is it 
asymptotically relevant or does it only affect the dynamics on short time-scales? Does the 
limit t»( fall in the same universality class of the Yule-Simon model without memory? 
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